To get a first feel for ggplot2, let’s try to run some basic ggplot2 commands. The mtcars dataset contains information on 32 cars from a 1973 issue of Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.
Load the ggplot2 package using library().
Use str() to explore the structure of the mtcars dataset.
Hit Submit Answer. This will execute the example code on the right. See if you can understand what ggplot does with the data.
# Load the ggplot2 package
library(ggplot2)
# Explore the mtcars data frame with str()
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# Execute the following command
ggplot(mtcars, aes(cyl, mpg)) +
geom_point()
The plot from the previous exercise wasn’t really satisfying. Although cyl (the number of cylinders) is categorical, you probably noticed that it is classified as numeric in mtcars. This is really misleading because the representation in the plot doesn’t match the actual data type. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.
Change the ggplot() command by wrapping factor() around cyl.
Hit Submit Answer and see if the resulting plot is better this time.
# Load the ggplot2 package
library(ggplot2)
# Change the command below so that cyl is treated as factor
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_point()
Let’s dive a little deeper into the three main topics in this course: The data, aesthetics, and geom layers. We’ll get to making pretty plots in the last chapter with the themes layer.
We’ll continue working on the 32 cars in the mtcars data frame.
Consider how the examples and concepts we discuss throughout these courses apply to your own data-sets!
aes(), add a color argument equal to disp.# Edit to add a color aesthetic mapped to disp
ggplot(mtcars, aes(wt, mpg, color = disp))+
geom_point()
disp to the size aesthetic.# Change the color aesthetic to a size aesthetic
ggplot(mtcars, aes(wt, mpg, size = disp)) +
geom_point()
In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale.
Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points. However, if you try this command in the console to the right:
ggplot(mtcars, aes(wt, mpg, shape = disp)) +
geom_point()
It gives an error. What does this mean?
g <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_jitter(alpha = 0.5)
g
g +
theme_classic()
The diamonds dataset contains details of 1,000 diamonds. Among the variables included are carat (a measurement of the diamond’s size) and price.
You’ll use two common geom layer functions:
geom_point() adds points (as in a scatter plot).
geom_smooth() adds a smooth trend curve.
As you saw previously, these are added using the + operator.
ggplot(data, aes(x, y)) +
geom_*()
Where * is the specific geometry needed.
diamonds data frame with the str() function.str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
+ operator to add geom_point() to the ggplot() command.# Add geom_point() with +
ggplot(diamonds, aes(carat, price)) +
geom_point()
+ operator to add geom_smooth().# Add geom_point() with +
ggplot(diamonds, aes(carat, price)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
If you have multiple geoms, then mapping an aesthetic to data variable inside the call to ggplot() will change all the geoms. It is also possible to make changes to individual geoms by passing arguments to the geom_*() functions.
geom_point() has an alpha argument that controls the opacity of the points. A value of 1 (the default) means that the points are totally opaque; a value of 0 means the points are totally transparent (and therefore invisible). Values in between specify transparency.
The plot you drew last time is provided in the script.
color aesthetic to the clarity data variable.# Map the color aesthetic to clarity
ggplot(diamonds, aes(carat, price, color = clarity)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# Make the points 40% opaque
ggplot(diamonds, aes(carat, price, color = clarity)) +
geom_point(alpha = 0.4) +
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Plots can be saved as variables, which can be added two later on using the + operator. This is really useful if you want to make multiple related plots from a common base.
Using the diamonds dataset, plot the price (y-axis) versus the carat (x-axis), assigning to plt_price_vs_carat.
Using geom_point(), add a point layer to plt_price_vs_carat.
# Draw a ggplot
plt_price_vs_carat <- ggplot(
# Use the diamonds dataset
diamonds,
# For the aesthetics, map x to carat and y to price
aes(x = carat, y = price)
)
# Add a point layer to plt_price_vs_carat
plt_price_vs_carat +
geom_point()
Add an alpha argument to the point layer to make the points 20% opaque, assigning to plt_price_vs_carat_transparent.
Type the plot’s variable name (plt_price_vs_carat_transparent) to display it.
# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))
# Edit this to make points 20% opaque: plt_price_vs_carat_transparent
plt_price_vs_carat_transparent <- plt_price_vs_carat + geom_point(alpha = 0.2)
# See the plot
plt_price_vs_carat_transparent
Inside geom_point(), call aes() and map color to clarity, assigning to plt_price_vs_carat_by_clarity.
Type the plot’s variable name (plt_price_vs_carat_by_clarity) to display it.
# From previous step
plt_price_vs_carat <- ggplot(diamonds, aes(carat, price))
# Edit this to map color to clarity,
# Assign the updated plot to a new object
plt_price_vs_carat_by_clarity <- plt_price_vs_carat +
geom_point(aes(color = clarity))
# See the plot
plt_price_vs_carat_by_clarity